Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration

نویسندگان

  • Krista Lagus
  • Timo Honkela
  • Samuel Kaski
  • Teuvo Kohonen
چکیده

Powerful methods for interactive exploration and search from collections of free-form textual documents are needed to manage the ever-increasing flood of digital information. In this article we present a method, WEBSOM, for automatic organization of full-text document collections using the self-organizing map (SOM) algorithm. The document collection is ordered onto a map in an unsupervised manner utilizing statistical information of short word contexts. The resulting ordered map where similar documents lie near each other thus presents a general view of the document space. With the aid of a suitable (WWWbased) interface, documents in interesting areas of the map can be browsed. The browsing can also be interactively extended to related topics, which appear in nearby areas on the map. Along with the method we present a case study of its use.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploration of Full-text Databases with Self-organizing Maps

Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-Organizing Maps (SOMs) are used to represent documents on...

متن کامل

Information Visualization with Self-Organizing Maps

The Self-Organizing Map (SOM) is an unsupervised neural network algorithm that projects highdimensional data onto a two-dimensional map. The projection preserves the topology of the data so that similar data items will be mapped to nearby locations on the map. Despite the popular use of the algorithm for clustering and information visualisation, a system has been lacking that combines the fast ...

متن کامل

WEBSOM - Self-organizing maps of document collections

Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable eeorts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of miscellaneous document collections. The WEB...

متن کامل

Exploration of Text Collections with Hierarchical Feature

Document classiication is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classiication techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classiicat...

متن کامل

Exploration of Document Collections with Self-Organizing Maps: A Novel Approach to Similarity Representation

Classiication is one of the central issues in any system dealing with text data. The need for eeective approaches is dramatically increased nowadays due to the advent of massive digital libraries containing free-form documents. What we are looking for are powerful methods for the exploration of such libraries whereby the detection of similarities between the various text documents is the overal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996